Homework 9.2

Krystin, Rohan, Joe

(a)

Krystin:

We can consider two models for growth of an individual cell, linear growth and exponential growth.

Linear growth

We will start with linear growth; stating that the growth is linear as the model. More precisely, we model bacterial growth as a constant process for each bacterium; it grows at the same rate regardless of bacterial mass. We can mathematize our model as

\begin{align} a(t) = a^0(1 + k t), \end{align}

where $a(t)$ is the area of the bacterium over time, and $t$ is the time since the last cell division. So, we now have our mathematical model. The growth rate is $a^0 * k$, and the area immediately after the last cell division is $a^0$.

For the statistical model, we need to model error in measurement. The idea is that the cell grows according to the above equation, but there will be some natural stochastic variation away from that curve. Furthermore, there are errors in measurement for the area at each time point. (We assume that we can measure the time exactly without error.) Thus, the measured area $a_i$ for a bacterium at time point $t_i$ is

\begin{align} a_i = a^0(1 + k t_i) + e_i, \end{align}

where $e_i$ is the variation in the measurement from the mathematical model, a residual. We need to specify how $e_i$ is distributed, and also the relationship between different time points. We first consider the latter. In time series analysis, the value (in this case the area) at time point $t_{i+1}$ may be influenced by some memory process by the value at time point $t_i$. Nonetheless, we often model measurements at different time points as i.i.d., only being connected with those at previous times by virtue of the fact that there is explicit time dependence in the mathematical model. This is typically a reasonable assumption, as many processes are memoryless.

Given that the measurements are i.i.d., we can model the residual, $e_i$. This is commonly modeled as Normal with mean zero and some finite variance. So, if we assume homoscedastic error, we could write

\begin{align} f(e_i;\sigma) = \frac{1}{\sqrt{2\pi \sigma^2}} \mathrm{e}^{-e_i^2/2\sigma^2}\;\forall i. \end{align}

We can then write the PDF for the joint distribution of all of the measured data,

\begin{align} f(\mathbf{a};\mathbf{t},a^0, k, \sigma) = \left(\frac{1}{2\pi\sigma^2}\right)^{n/2}\exp\left[\frac{-1}{2\sigma^2}\sum_{i=1}^n(a_i - a^0-a^0 kt_i)^2\right]. \end{align}

It is convenient to write this in shorthand.

\begin{align} &a_i = a^0(1 + k t_i) + e_i \;\forall i,\\ &e_i \sim \text{Norm}(0, \sigma)\;\forall i, \end{align}

or, equivalently,

\begin{align} a_i \sim \text{Norm}(a^0(1 + k t_i), \sigma)\;\forall i. \end{align}

Exponential growth

We can use exactly the same logic as above to write the model for Exponential growth.

\begin{align} &a_i = a^0 \mathrm{e}^{kt} + e_i \;\forall i,\\ &e_i \sim \text{Norm}(0, \sigma)\;\forall i, \end{align}

or, equivalently,

\begin{align} a_i \sim \text{Norm}(a^0 \mathrm{e}^{kt}, \sigma)\;\forall i. \end{align}

(b)

In [836]:
# Colab setup -----------`-------
import os, sys, subprocess
if "google.colab" in sys.modules:
    cmd = "pip install --upgrade iqplot datashader bebi103 watermark"
    process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate()
    data_path = "https://s3.amazonaws.com/bebi103.caltech.edu/data/"
else:
    data_path = "../data/"
# ------------------------------
try:
    import multiprocess
except:
    import multiprocessing as multiprocess

import warnings
    
import numpy as np
import pandas as pd

import bebi103
import iqplot
import scipy
import scipy.stats as st
import holoviews as hv
import holoviews.operation.datashader
import datashader
import bokeh.palettes


hv.extension('bokeh')

import bokeh
bebi103.hv.set_defaults()
bokeh.io.output_notebook()

import numpy.random
rg = numpy.random.default_rng()
import random
import numba
import panel as pn
Loading BokehJS ...

First we start with some exploratory data analysis. We load in our data and then visualize the area for bacterium 1 and 2. We see that the distribution of these are very similar.

In [837]:
df = pd.read_csv(data_path + "caulobacter_growth_events.csv")
df.head()
Out[837]:
time (min) area (µm²) bacterium growth event
0 1.0 1.300624 1 0
1 2.0 1.314144 1 0
2 3.0 1.295216 1 0
3 4.0 1.314144 1 0
4 5.0 1.341184 1 0
In [838]:
p = iqplot.ecdf(df, 
                q = "area (µm²)", 
                cats = ["bacterium"])

bokeh.io.show(p)

p = iqplot.stripbox(df, 
                    q = "area (µm²)", 
                    cats = ["bacterium"],
                    top_level = "box",
                    y_axis_label = 'Bacterium',
                    whisker_kwargs=dict(line_color='#00000f', line_width=.8),
                    box_kwargs=dict(line_color='#00000f', line_width=.8),
                    median_kwargs=dict(line_color='#00000f', line_width=.8),
                    marker_kwargs=dict(alpha=0.3), 
                    jitter = 1)
bokeh.io.show(p)

We then decided to plot time (min) vs. area($\mu M^2$). We saw that over time, the area would increase and spike, then reset. This is because for each new growth event, the bacterium starts growing and once it divides, a new growth event begins and the area that we are measuring resets. Because because we want to get good and consistent parameter estimates for $a_0$ and $k$, we made a new dataframe where for each growth event, the time was reset to start from 0.

In [839]:
for i in [1,2]:
    p = bokeh.plotting.figure(
    width=740,
    height=300,
    x_axis_label="time (min)",
    y_axis_label="area (µm²)",
    title = "Time vs. Area bacteria {}".format(i)
    )
    
    p.line(
        source = df[df["bacterium"] == i],
        x= "time (min)",
        y= "area (µm²)",
        #size = 1,
        #color=colors[i-1],
        legend_label = str(i)
    )

    #p.legend.title = "Bacterium repeats"
    p.legend.location = 'top_right'
    p.legend.click_policy = "hide"
    bokeh.io.show(p)

We notice that for each growth event, we see a sharp increase in area, then it returns to a baseline for the start of the following growth event. Due to this we will reset the time (min) to start at 0 for each new growth event, as we are assuming them iid.

In [840]:
## Reset time to be 0 for each new growth event.
df_1 = df[df["bacterium"] == 1]
df_2 = df[df["bacterium"] == 2]

import warnings
warnings.filterwarnings('ignore')
#warnings.filterwarnings(action='once')

for ge in np.unique(df_1["growth event"]):
    minimum = np.min(df_1.loc[df_1["growth event"] == ge, "time (min)"])
    df_1.loc[df_1["growth event"] == ge, "time (min)"] -=  minimum
    
for ge in np.unique(df_2["growth event"]):
    minimum = np.min(df_2.loc[df_2["growth event"] == ge, "time (min)"])
    df_2.loc[df_2["growth event"] == ge, "time (min)"] -=  minimum
In [841]:
def a_lin(a_0, k, t):
    '''Computes a(t) for linear growth model.'''
    return a_0 * (1 + k*t)
def a_exp(a_0, k, t):
    '''Computes a(t) for exponential growth model.'''
    return a_0 * np.exp(k*t)
In [842]:
def resid_lin(params, t, A):
        """Residual for Linear growth model."""
        return A - a_lin(*params, t)
    
def resid_exp(params, t, A):
        """Residual for exponential growth model."""
        return A - a_exp(*params, t)

    
def mle_iid_lin(data):
    '''Computes MLE extimates for a_0, k for linear model.'''
    t, A = data[:, 0], data[:,1]
    res = scipy.optimize.least_squares(
        resid_lin, 
        np.array([1, .05]), 
        args=(t, A), 
        bounds=([.001, -np.inf], [np.inf, np.inf])
    )
    
    mle = res.x
    
    return mle

def mle_iid_exp(data):
    '''Computes MLE extimates for a_0, k for exponential model.'''
    t, A = data[:, 0], data[:,1]
    res = scipy.optimize.least_squares(
        resid_exp, 
        np.array([1, .05]), 
        args=(t, A), 
        bounds=([.001, -np.inf], [np.inf, np.inf])
    )
    
    mle = res.x
    
    return mle

def gen_fun_lin(a_0, k, sigma, t,  size = 1):
    '''Generates sample A using linear growth model using times from gven dataset.'''
    a_gen =  a_lin(a_0, k, t) + np.random.normal(loc = 0, scale = sigma, size = t.size)
    return a_gen

def gen_fun_exp(a_0, k, sigma, t,  size = 1):
    '''Generates sample A using exponential growth model using times from gven dataset.'''
    a_gen =  a_exp(a_0, k, t) + np.random.normal(loc = 0, scale = sigma, size = t.size)
    return a_gen
In [843]:
def plot_mles(mles):
    '''Plots a_0 and k for each growth event for both bacteria given dictionary of MLE DataFrames'''
    hv.extension("bokeh")
    
    bact_1 = mles[1]
    bact_2 = mles[2]
    
    graphs = []
    
    for i, model in enumerate(["Linear", "Exponential"]):

        p1 = hv.Points(
            data = bact_1[i],
            kdims = ["a_0", "k"],
            label = "Bact. 1"
            #vdims = ["growth event"]
        ).opts(
            title = "{} Growth model".format(model),
            color = "blue",
            tools=['hover'])

        p2 = hv.Points(
            data = bact_2[i],
            kdims = ["a_0", "k"],
            label = "Bact. 2"
            #vdims = ["growth event"]
        ).opts(
            title = "k MLE",
            color = "red",
            tools=['hover'])
        
        g = p1 * p2
        g.opts(legend_position = "top_right")
        graphs.append(g)
    
    
    return graphs[0] + graphs[1]
In [844]:
#hv.extension("bokeh")
dfs = [df_1, df_2]
mles = {}
samps = {}

for bact in [1, 2]:    
    samps_lin = {}
    samps_exp = {}

    mles_lin = []
    mles_exp = []

    #bact = 1
    d = dfs[bact-1]

    print("Computing bacteria {} MLEs".format(bact))
    events = np.unique(d["growth event"])
    for ge in events:
        sub_df = d[d["growth event"] == ge]
        data = sub_df[["time (min)", "area (µm²)"]].values
        mle_lin = mle_iid_lin(data)
        mle_exp = mle_iid_exp(data)

        # Find standard deviation MLE of residuals 
        t, A = data[:, 0], data[:,1]
        sigma_lin = np.std(resid_lin(mle_lin, t, A))
        sigma_exp = np.std(resid_exp(mle_exp, t, A))

        #generate samples
        num_samps = 1000
        samps_lin[ge] = [gen_fun_lin(mle_lin[0], mle_lin[1], sigma_lin, t) for i in range(num_samps)]
        samps_exp[ge] = [gen_fun_exp(mle_exp[0], mle_exp[1], sigma_exp, t) for i in range(num_samps)]


        mles_lin.append([ge, mle_lin[0], mle_lin[1]])
        mles_exp.append([ge, mle_exp[0], mle_exp[1]])

    # Combine mle datasets into one dataframe for each    
    mles_lin = pd.DataFrame(data = mles_lin, columns = ["growth event", "a_0", "k"])#.set_index("growth event")
    mles_exp = pd.DataFrame(data = mles_exp, columns = ["growth event", "a_0", "k"])#.set_index("growth event")

    mles[bact] = [mles_lin, mles_exp]
    samps[bact] = [samps_lin, samps_exp]
    

plot_mles(mles)
    
Computing bacteria 1 MLEs
Computing bacteria 2 MLEs
Out[844]:

For each growth event, we used scipy.optimize.least_squares() to minimize the residual of $f(\mathbf{a};\mathbf{t},a^0, k, \sigma)$ find the MLE parameters $\hat{a_0}, \hat{k}$ for both the linear and exponential models. We then found $\sigma$, which is the plug in MLE for the standard deviation of the residuals. The results are graphed above. We see that the $a_0$ MLE values are largely the same for the exponential and linear models, which makes sense given that it represents the starting area. The $k$ MLEs differ between models, but show lower variance within the exponential model. This indicates that the exponential model may fit the data better, because K is more similar for each growth event.

For the linear and exponential models, we simulated the generative distribution by drawing from $a_i \sim \text{Norm}(\hat{a^0}(1 + \hat{k} t_i), \sigma)$ and $a_i \sim \text{Norm}(\hat{a^0} \mathrm{e}^{\hat{k}t}, \sigma)\;\forall i$ respectively.

c)

Model Comparison with Predictive Regression

To compare the linear and exponential growth model, we decided to graphically assess them. We computed the MLE of the parameters $a_0$ and $k$ above and made predictive regression plots. In the plot below, we can see how the data points tend to fit the exponential model much better. We plotted against the measured data both 1. the predictive regression and 2. the difference between the the data and the median area. To do this, we drew samples from both generative distributions for bacteria 1 and 2 for the area values that were actually measured. We see that in the differences plot, the measurements consistently deviate from the model, with the difference in area dipping around the middle time value and increasing at the tails, suggesting that the linear model is missing time dependence. On the other hand, there does not seem to be a systematic way in which measurements deviate from the exponential model and looks independent of time.

In [845]:
## Dashbord Plotting Functions and widgets
def plot_pred_reg(ge, samps, d, title = "linear growth"):
    '''Returns predictive regression plot for data d, samples s and growth event ge.'''
    samples = np.array(samps[ge])
    times = d.loc[d["growth event"] == ge, ["time (min)"]].values

    p = bebi103.viz.predictive_regression(
        samples=samples,
        samples_x=times.reshape(times.size, 1),
        data=d.loc[d["growth event"] == ge, ["time (min)", "area (µm²)"]].values,
        #percentiles = [95, 65],
        x_axis_label='time (min)',
        y_axis_label='area (µm^2)',
        title = title,
        width = 250,
        frame_height = 250
    )
    
    return p
    
def plot_pred_reg_diff(ge, samps, d, title = "linear growth"):
    '''Returns predictive regression difference plot for data d, samples s and growth event ge.'''
    samples = np.array(samps[ge])
    times = d.loc[d["growth event"] == ge, "time (min)"].values
    p = bebi103.viz.predictive_regression(
        samples=samples,
        samples_x=times,
        data=d.loc[d["growth event"] == ge, ["time (min)", "area (µm²)"]].values,
        diff=True,
        x_axis_label='time (min)',
        y_axis_label='diff. area (µm^2)',
        frame_height = 250
        #kwargs=dict(plot_width=400, plot_height=400)
    )
    return p


# Dashboard Widgets
bact_selector = pn.widgets.Select(
    name="Bacterium", options=[1,2], value=1
)
max_ges = {1:int(np.max(df_1["growth event"])), 2:int(np.max(df_2["growth event"]))}
slider = pn.widgets.IntSlider(name='Growth event', start=0, end=max_ges[2], step=1, value=0)

@pn.depends(
    bact = bact_selector.param.value,
    ge = slider.param.value
)
def plot_pred_reg_interactive_lin(ge, bact):
    if bact == 1 and ge > 19:
        ge = 19
    samps_lin = samps[bact][0]
    d = dfs[bact-1]
    return plot_pred_reg(ge, samps_lin, d)

@pn.depends(
    bact = bact_selector.param.value,
    ge = slider.param.value
)
def plot_pred_reg_interactive_lin_diff(ge, bact):
    if bact == 1 and ge > 19:
        ge = 19
    samps_lin = samps[bact][0]
    d = dfs[bact-1]
    return plot_pred_reg_diff(ge, samps_lin, d)

@pn.depends(
    bact = bact_selector.param.value,
    ge = slider.param.value,
)
def plot_pred_reg_interactive_exp(ge, bact):
    if bact == 1 and ge > 19:
        ge = 19
    samps_exp = samps[bact][1]
    d = dfs[bact-1]
    return plot_pred_reg(ge, samps_exp, d, title = "exponential growth")

@pn.depends(
    bact = bact_selector.param.value,
    ge = slider.param.value,
)
def plot_pred_reg_interactive_exp_diff(ge, bact):
    if bact == 1 and ge > 19:
        ge = 19
    samps_exp = samps[bact][1]
    d = dfs[bact-1]
    return plot_pred_reg_diff(ge, samps_exp, d)

# Create Dashboard
r1 = pn.Row(plot_pred_reg_interactive_lin, plot_pred_reg_interactive_exp)
r2 = pn.Row(plot_pred_reg_interactive_lin_diff, plot_pred_reg_interactive_exp_diff)
pn.Column(r1, r2, slider, bact_selector)
Out[845]:

*Note: Growth events over 19 default to 19 for Bacterium 1.

The above illustrates the model accuracy for the exponential and linear models for each bacteria and given growth event. The top two graphs are predictive regression lines, that draw generative samples from our MLE model and compare them to the actual data points. The bottom plots compare the differences between the MLE model's samples and the actual data, giving a clearer picture of how closely our MLE generative model was to the original data.

We see that for most growth events, the predictive regression plots show the exponential model to fit the data better. The points are mostly within the 95% confidence intervals for the exponential model, whereas they lie outside the 95% confidence intervals of the linear models - especially at the beginning a end of each growth event. We see a similar trend with the difference plots. The exponential model produces consistently smaller differences throughout the growth event. However, the linear model deviates from the sample data significantly more at the the beginning and end of most growth events.

AIC Metrics for Model Comparison

In order to further compare the linear and exponential model for Caulobacter growth, we used the Akaike information criterion aka the AIC. We already calculated the MLEs for our parameters $a_0$ and $k$ so we modeled the log-likelihood of our data $\ell(\theta;\text{data})$ to get the AIC given by

\begin{align} \text{AIC} = -2\ell(a_0^*, k^*;\text{data}) + 2*p, \end{align}

where p=2 parameters. To compare the models, we use our functions for computing the log-likelihood. Then we load in our data and compute the MLEs and log likelihood evaluated at the MLEs for both bacteria at each growth event. The model with less and more meaningful parameters and high likelihood function at the MLE have lower AIC, which is favored.

In [846]:
def log_like_lin(params, t, A):
    #a_0, k = params
    resid = resid_lin(params, t, A)
    sigma = np.std(resid)
    #print(sigma)

    
    like = np.sum(scipy.stats.norm.logpdf(A, loc = a_lin(*params, t), scale = sigma))
    
    
    #print(1/(2*np.pi*sigma**2)**(t.size/2), np.exp(-1/(2*sigma**2) * np.sum(resid**2)))
    return like

def log_like_exp(params, t, A):
    resid = resid_exp(params, t, A)
    
    sigma = np.std(resid)
    #like = 1/(2*np.pi*sigma**2)**(t.size/2) * np.exp((-1/(2*sigma**2))*(np.sum(resid**2)))
    like = np.sum(scipy.stats.norm.logpdf(A, loc = a_exp(*params, t), scale = sigma))
    
    return like
In [847]:
### AIC for Model Comparison
dfs = [df_1, df_2]
df_mle = pd.DataFrame(index = ["alpha_0_lin", "k_lin", "alpha_0_exp", "k_exp",
                               "log_like_lin", "log_like_exp", "AIC_lin", "AIC_exp"])

for bact in [1,2]:
    samps_lin = {}
    samps_exp = {}

    
    d = dfs[bact - 1]

    #print("bacteria {} ".format(bact))

    events = np.unique(d["growth event"])
    for ge in events:
        sub_df = d[d["growth event"] == ge]
        data = sub_df[["time (min)", "area (µm²)"]].values
        
        #MLEs
        alpha_0_lin, k_lin = mle_iid_lin(data)
        alpha_0_exp, k_exp = mle_iid_exp(data)
        
        # Log likelihoods
        ll_lin = log_like_lin((alpha_0_lin, k_lin), t, A)
        ll_exp = log_like_exp((alpha_0_exp, k_exp), t, A)
        
        AIC_lin = -2 * (ll_lin - 3)
        AIC_exp = -2 * (ll_exp - 3)
        
        col = "(b{}, ge{})".format(bact, ge)
        df_mle[col] = [alpha_0_lin, k_lin, alpha_0_exp, k_exp, ll_lin, ll_exp, AIC_lin, AIC_exp]
        
        t, A = data[:, 0], data[:,1]
        
    
df_mle
Out[847]:
(b1, ge0) (b1, ge1) (b1, ge2) (b1, ge3) (b1, ge4) (b1, ge5) (b1, ge6) (b1, ge7) (b1, ge8) (b1, ge9) ... (b2, ge77) (b2, ge78) (b2, ge79) (b2, ge80) (b2, ge81) (b2, ge82) (b2, ge83) (b2, ge84) (b2, ge85) (b2, ge86)
alpha_0_lin 1.248909 1.292365 1.303955 1.313438 1.393821 1.589791 1.633933 1.535399 1.413565 1.453400 ... 1.263082 1.337429 1.186997 1.169559 1.665317 1.243817 1.244197 1.241933 1.206259 1.324927
k_lin 0.008699 0.008738 0.009086 0.009139 0.009394 0.008752 0.008404 0.008503 0.008778 0.008256 ... 0.008613 0.010070 0.008914 0.014359 0.004755 0.009956 0.008297 0.008619 0.009137 0.006815
alpha_0_exp 1.296753 1.337626 1.362179 1.367165 1.463754 1.650341 1.685907 1.580796 1.470991 1.501616 ... 1.338471 1.388258 1.251458 1.282127 1.674529 1.337833 1.285455 1.304986 1.264203 1.330541
k_exp 0.006169 0.006262 0.006276 0.006422 0.006360 0.006224 0.006156 0.006280 0.006183 0.006001 ... 0.005659 0.007106 0.005931 0.008308 0.004038 0.006299 0.006028 0.005834 0.006229 0.006070
log_like_lin 23.259240 14.286393 125.856742 186.055880 -358.115418 -1262.361996 133.216841 -471.942520 -441.329110 179.998963 ... -10689.419737 -246.908506 -225.295348 -76.871780 -23.581595 -39.970540 25.378086 183.500506 219.416825 129.011113
log_like_exp -225.633094 -210.966131 113.341557 188.351506 -1278.872549 -1351.698073 105.098749 -1031.972334 -659.629121 182.552035 ... -2898.160339 -123.147867 -199.688298 -75.045150 -16.913823 -63.250805 33.841756 232.568451 218.885986 -672.863157
AIC_lin -40.518481 -22.572785 -245.713483 -366.111761 722.230836 2530.723993 -260.433682 949.885039 888.658220 -353.997925 ... 21384.839474 499.817013 456.590696 159.743561 53.163190 85.941081 -44.756171 -361.001013 -432.833649 -252.022226
AIC_exp 457.266188 427.932263 -220.683114 -370.703012 2563.745097 2709.396145 -204.197499 2069.944668 1325.258243 -359.104070 ... 5802.320679 252.295734 405.376596 156.090301 39.827645 132.501610 -61.683512 -459.136902 -431.771972 1351.726314

8 rows × 107 columns

In order to compare the AICs, because we have so many different growth events for the two bacteria, we found the sum of the AICs for the linear model and the exponential model. We found that the linear model had a higher sum of AICs by roughly 27,000. This difference is large enough to conclude that the exponential distribution is a better model for the growth of a single Caulobacter cell.

In [848]:
sums = df_mle.sum(axis = 1)
print('''Sum of AICs
Linear: {:.3f}
Exponential: {:.3f}
Difference: {:.3f}'''.format(sums.loc["AIC_lin"],sums.loc["AIC_exp"], sums.loc["AIC_lin"]-sums.loc["AIC_exp"] ))
Sum of AICs
Linear: 105489.459
Exponential: 79976.995
Difference: 25512.464

Plotting the ECDF allowed us to visualize all the AICs for each model for all our data. The ECDFs are similarly distributed, but the linear model has some values that are much larger and thus a worse fit for the measured data.

In [849]:
def ecdf_vals(data):
    ''' Takes in a Numpy array or Pandas Series and will return 
        x and y values for plotting an ecdf'''
    data = np.sort(data)
    return np.asarray([[val, (i+1)/(len(data))] for i, val in enumerate(data) 
                       if i == len(data)-1 or val != data[i+1]])

d_t = df_mle.T

# prepare list of colors

d_t = df_mle.T
p = bokeh.plotting.figure(
width=500,
height=400,
title = "AIC ECDF",
x_axis_label = "AIC",
)

colors = ["red", "blue"]
legend_name = ["Linear", "Exponential"]
for i, lab in enumerate([ecdf_vals(d_t.AIC_lin), ecdf_vals(d_t.AIC_exp)]):
    p.circle(
        x=lab[:,0],
        y=lab[:,1],
        color=colors[i],
        size=2,
        legend_label= legend_name[i]
    )

p.legend.location = 'bottom_right'
p.legend.click_policy = "hide"
bokeh.io.show(p)
In [850]:
%load_ext watermark

%watermark -v -p numpy,pandas,jupyterlab
The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
CPython 3.8.5
IPython 7.18.1

numpy 1.19.1
pandas 1.1.1
jupyterlab 2.2.6
In [ ]:
 
In [ ]:
 
In [ ]: